Second-Order Optimization over the Multivariate Gaussian Distribution
نویسندگان
چکیده
We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method. In this paper we study the optimization of a real-valued function by means of its Stochastic Relaxation (SR), i.e., we search for the optimum of the function by optimizing the expected value of the function itself over a statistical model. This approach in optimization is very general and it has been developed in many different fields, from statistical physics and random-search methods, e.g., the Gibbs sampler in optimization [1], simulated annealing and the cross-entropy method [2]; to black-box optimization in evolutionary computation, e.g., Estimation of Distribution Algorithms [3] and evolutionary strategies [4–7]; going through well known techniques in polynomial optimization, such as the method of the moments [8]. By optimizing the SR of a function, we move from the original search space to a new search space given by a statistical model, i.e., a set of probability densities. Once we introduce a parameterization for the statistical model, the parameters of the model become the new variables of the relaxed problem. Notice that the notion of stochastic relaxation differs from the common notion of relaxation in
منابع مشابه
Gradient Formulae for Nonlinear Probabilistic Constraints with Gaussian and Gaussian-Like Distributions
Probabilistic constraints represent a major model of stochastic optimization. A possible approach for solving probabilistically constrained optimization problems consists in applying nonlinear programming methods. In order to do so, one has to provide sufficiently precise approximations for values and gradients of probability functions. For linear probabilistic constraints under Gaussian distri...
متن کاملNonorthogonal Independent Vector Analysis Using Multivariate Gaussian Model
We consider the problem of joint blind source separation of multiple datasets and introduce an effective solution to the problem. We pose the problem in an independent vector analysis (IVA) framework utilizing the multivariate Gaussian source vector distribution. We provide a new general IVA implementation using a decoupled nonorthogonal optimization algorithm and establish the connection betwe...
متن کاملMultivariate parametric density estimation based on the modified Cramér-von Mises distance
In this paper, a novel distance-based density estimation method is proposed, which considers the overall density function in the goodness-of-fit. In detail, the parameters of Gaussian mixture densities are estimated from samples, based on the distance of the cumulative distributions over the entire state space. Due to the ambiguous definition of the standard multivariate cumulative distribution...
متن کاملMultivariate Cauchy EDA Optimisation
We consider Black-Box continuous optimization by Estimation of Distribution Algorithms (EDA). In continuous EDA, the multivariate Gaussian distribution is widely used as a search operator, and it has the well-known advantage of modelling the correlation structure of the search variables, which univariate EDA lacks. However, the Gaussian distribution as a search operator is prone to premature co...
متن کاملA Gradient Formula for Linear Chance Constraints Under Gaussian Distribution
We provide an explicit gradient formula for linear chance constraints under a (possibly singular) multivariate Gaussian distribution. This formula allows one to reduce the calculus of gradients to the calculus of values of the same type of chance constraints (in smaller dimension and with different distribution parameters). This is an important aspect for the numerical solution of stochastic op...
متن کامل